The maximum total points you can receive in this lab quiz is capped at 30.
You have ***2 hours*** to complete the tasks.
You may use web resources to scout for programming library documents, conceptual background, etc. in solving the tasks. You can also use AI tools such as LLMs. However, ***do not plagiarize*** code. Write your own code instead. I might run your submission through a plagiarism checking software and/or carry out manual checks to detect and penalize plagiarism.
Though you will have internet connection, ***DO NOT communicate with anyone***, within the class or outside. This lab test is in part trust-based and depends on a honour system. Thus, if you are caught cheating/communicating with others (or if you are reported to do so by a classmate with credible evidence to said effect), you will be given 0 marks for the whole course, in effect, you will fail the course. Furthermore, I will report you for disciplinary actions by the university for cheating.
Upload a .zip file containing your jupyter notebook (with all the outputs) and a HTML and/or PDF (also with the outputs) on NTUlearn through the applicable assignment link. The files should be named ***YourName .zip, .ipynb and .pdf/.html*** respectively.
If you use any AI services such as ChatGPT, please provide a clear description (at the end of this notebook) of how you used it, e.g., what prompts were used, how the response (if possible, share screenshots) was used in your solution.
Please do NOT share the test questions or solutions with anyone else, even after this semester or even your studies at NTU are over. This does not however prevent you from discussing the solutions after the test is over, among fellow students from the same cohort.
# import main modules
import pandas as pd
import numpy as np
Problem 1: (4 points)
Consider the provided ExtortionEmailCollection.html file. Create a list of Bitcoin addresses that can be found in this html file.
You may alternatively directly use https://www.u.tsukuba.ac.jp/phishing-collection/ from which the html file was created.
An accompanying BitcoinAddressFormats.pdf file contains some information regarding various Bitcoin address formats. Feel free to use other online resources to find more information about Bitcoin address formats.
from tqdm.notebook import tqdm
import pickle
import json
import re
import requests
from bs4 import BeautifulSoup
# Creating PrettyPrinter Instance
import pprint
pp = pprint.PrettyPrinter(indent=2)
# Creating a regex function to extract bitcoin addreses
def extract_bitcoin_addresses(text):
pattern = r'[13][a-km-zA-HJ-NP-Z0-9]{26,35}'
bitcoin_address = re.findall(pattern, text)
return ' '.join(bitcoin_address)
extract_bitcoin_addresses('15wz4Cccpwf7UKz3C6VWoAM4fJi6gKqvrR')
'15wz4Cccpwf7UKz3C6VWoAM4fJi6gKqvrR'
# Your code for Problem 1 here
url = "https://www.u.tsukuba.ac.jp/phishing-collection/"
response = requests.get(url)
html_content = response.text
soup = BeautifulSoup(html_content, "html.parser")
# Find all blockquote elements that consist of each container information
blockquotes = soup.find_all("blockquote")
bitcoin_addresses = []
# Checking for bitcoin addresses
for blockquote in blockquotes:
blockquote_text = blockquote.find('pre').text
blockquote_text_lower = blockquote_text.lower()
# Check if the blockquote contains Bitcoin-related information
if "btc" in blockquote_text_lower or 'bitcoin' in blockquote_text_lower:
target = extract_bitcoin_addresses(blockquote_text)
if target:
bitcoin_addresses.append(target)
print("Extracted Bitcoin Address:", target)
print("Bitcoin Address Found")
Extracted Bitcoin Address: 15wz4Cccpwf7UKz3C6VWoAM4fJi6gKqvrR Bitcoin Address Found Extracted Bitcoin Address: 15wz4Cccpwf7UKz3C6VWoAM4fJi6gKqvrR Bitcoin Address Found Extracted Bitcoin Address: 18wUUSghRQJ2FJoBY9TuE9xqPooSqCvTXX Bitcoin Address Found Extracted Bitcoin Address: 1H1K8MfLEJgjCCfDEkTJmv9GJjD3XzEFGR Bitcoin Address Found Extracted Bitcoin Address: 15uBUPv1gzyDRu9psWEujr76XqjiTLZqk4 Bitcoin Address Found Extracted Bitcoin Address: 1M3uh3QNTxVsK1MqR4cBdqajojUixCiwwq Bitcoin Address Found Extracted Bitcoin Address: 1Hbfkn3aPByGQFJRqS9ce26qQNEpG9rp4T Bitcoin Address Found Extracted Bitcoin Address: 1PS1N19snfwSE9WTjrveoKgEYEuJM99qR9 Bitcoin Address Found
# Print the information stored in the bitcoin_list
for address in bitcoin_addresses:
print(address)
15wz4Cccpwf7UKz3C6VWoAM4fJi6gKqvrR 15wz4Cccpwf7UKz3C6VWoAM4fJi6gKqvrR 18wUUSghRQJ2FJoBY9TuE9xqPooSqCvTXX 1H1K8MfLEJgjCCfDEkTJmv9GJjD3XzEFGR 15uBUPv1gzyDRu9psWEujr76XqjiTLZqk4 1M3uh3QNTxVsK1MqR4cBdqajojUixCiwwq 1Hbfkn3aPByGQFJRqS9ce26qQNEpG9rp4T 1PS1N19snfwSE9WTjrveoKgEYEuJM99qR9
Problem 2: (5 points)
Consider the weight-height.csv file, which has three columns indicating individuals' gender, height and weight.
While collecting the data, the respondents were advised to report their weight and height correctly, however they were to report their gender through a randomization process, thus providing privacy through plausible deniability.
Consider the following random response mechanism. The respondent flips a balanced coin. If the coin flip results in a HEAD, then the respondent provides the TRUE answer regarding their gender. If the coin flip results in a TAIL, then the respondent provides Male or Female as responses with equal probability, i.e., 0.5.
Determine an estimate for the actual number of male and female respondents, and explain your approach in the discussion place-holder below the code place-holder.
# Your code for Problem 2 here
df = pd.read_csv('weight-height.csv')
df
| Gender | Height | Weight | |
|---|---|---|---|
| 0 | Male | 73.847017 | 241.893563 |
| 1 | Male | 68.781904 | 162.310473 |
| 2 | Male | 74.110105 | 212.740856 |
| 3 | Male | 71.730978 | 220.042470 |
| 4 | Male | 69.881796 | 206.349801 |
| ... | ... | ... | ... |
| 9995 | Female | 66.172652 | 136.777454 |
| 9996 | Female | 67.067155 | 170.867906 |
| 9997 | Female | 63.867992 | 128.475319 |
| 9998 | Female | 69.034243 | 163.852461 |
| 9999 | Female | 61.944246 | 113.649103 |
10000 rows × 3 columns
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Gender 10000 non-null object 1 Height 10000 non-null float64 2 Weight 10000 non-null float64 dtypes: float64(2), object(1) memory usage: 234.5+ KB
male_respondents = df[df['Gender'] == 'Male']
female_respondents = df[df['Gender'] == 'Female']
print(len(male_respondents), len(female_respondents))
5000 5000
def estimate_genders(total_respondents, reported_males):
# Solving for M and F
actual_males = (reported_males - 0.25 * total_respondents) / 0.5
actual_females = total_respondents - actual_males
return actual_males, actual_females
num_respondents = 10000
male_respondents = 5000
estimate_genders(num_respondents, male_respondents)
(5000.0, 5000.0)
Discussion place-holder: Your justification and the analysis of your result is to be discussed here.
Answer:
Probabilities based on random response mechanism
Based on the equations above, we get:
Hence, I created a function to represent equation (1) in the probabilities and using (2) to get the actual females by taking the total respondents subtract the estimated number of male respondents using the probabilites calculated
Problem 3: (8 points)
Consider the list of countries elected as members of the security counil: https://www.un.org/securitycouncil/content/countries-elected-members
Consider also the list of countries that have never been in the security council: https://www.un.org/securitycouncil/content/countries-never-elected-members-security-council
Create a (set of) visualization(s) that encode in a consolicated manner the various kinds of information available across these two webpages.
Consider (and explain in the accompanying Discussion place-holder) why you have chosen a given visualization technique, and how you have ensured it has low lie-factor, high data-ink ratio while also meeting accessibility and aesthetics requirements.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
elected_url = 'https://www.un.org/securitycouncil/content/countries-elected-members'
non_elected_url = 'https://www.un.org/securitycouncil/content/countries-never-elected-members-security-council'
Below, I have used selenium to scrape the whole chunk of countries so that it will be easier for me to store the data reducing time spent to copy paste from the actual website.
# Your code for Problem 3
# Using Selenium instead of using Requests due to UN Security Page
driver = webdriver.Chrome()
driver.get(elected_url)
elected_countries = []
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
strong_container = soup.find_all('strong')
for element in strong_container:
elected_countries.append(element.get_text())
print(element.get_text())
driver.quit()
Albania Algeria Angola Argentina Australia Austria Azerbaijan Bahrain Bangladesh Belarus Belgium Benin Bolivia Bosnia and Herzegovina Botswana Brazil Bulgaria Burkina Faso Burundi Cabo Verde Cameroon Canada Ceylon Chad Chile China Colombia Congo Costa Rica Côte d'Ivoire Croatia Cuba Czech Republic Czechoslovakia Democratic Republic of the Congo Denmark Djibouti Dominican Republic Ecuador Egypt Equatorial Guinea Estonia Ethiopia Finland France Gabon Gambia German Democratic Republic Germany Ghana Greece Guatemala Guinea Guinea-Bissau Guyana Honduras Hungary India Indonesia Iran (Islamic Republic of) Iraq Ireland Italy Ivory Coast Jamaica Japan Jordan Kazakhstan Kenya Kuwait Lebanon Liberia Libyan Arab Jamahiriya Lithuania Luxembourg Madagascar Malaysia Mali Malta Mauritania Mauritius Mexico Morocco Mozambique Namibia Nepal the Netherlands New Zealand Nicaragua Niger Nigeria Norway Oman Pakistan Panama Paraguay Peru Philippines Poland Portugal Qatar Republic of Korea Romania Russian Federation Rwanda Saint Vincent and the Grenadines Saudi Arabia Senegal Sierra Leone Singapore Slovakia Slovenia Somalia South Africa Spain Sri Lanka Sudan Sweden Switzerland Syrian Arab Republic Thailand Togo Trinidad and Tobago Tunisia Türkiye Uganda Ukraine Union of Soviet Socialist Republics United Arab Emirates United Arab Republic United Kingdom of Great Britain and Northern Ireland United Republic of Tanzania United States of America Uruguay Venezuela (Bolivarian Republic of) Viet Nam Yemen Yugoslavia Zaire Zambia Zimbabwe Security Council Member Dashboard Photos of Security Council Members
text_1 = """
Albania
Algeria
Angola
Argentina
Australia
Austria
Azerbaijan
Bahrain
Bangladesh
Belarus
Belgium
Benin
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Bulgaria
Burkina Faso
Burundi
Cabo Verde
Cameroon
Canada
Ceylon
Chad
Chile
China
Colombia
Congo
Costa Rica
Côte d'Ivoire
Croatia
Cuba
Czech Republic
Czechoslovakia
Democratic Republic of the Congo
Denmark
Djibouti
Dominican Republic
Ecuador
Egypt
Equatorial Guinea
Estonia
Ethiopia
Finland
France
Gabon
Gambia
German Democratic Republic
Germany
Ghana
Greece
Guatemala
Guinea
Guinea-Bissau
Guyana
Honduras
Hungary
India
Indonesia
Iran (Islamic Republic of)
Iraq
Ireland
Italy
Ivory Coast
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kuwait
Lebanon
Liberia
Libyan Arab Jamahiriya
Lithuania
Luxembourg
Madagascar
Malaysia
Mali
Malta
Mauritania
Mauritius
Mexico
Morocco
Mozambique
Namibia
Nepal
the Netherlands
New Zealand
Nicaragua
Niger
Nigeria
Norway
Oman
Pakistan
Panama
Paraguay
Peru
Philippines
Poland
Portugal
Qatar
Republic of Korea
Romania
Russian Federation
Rwanda
Saint Vincent and the Grenadines
Saudi Arabia
Senegal
Sierra Leone
Singapore
Slovakia
Slovenia
Somalia
South Africa
Spain
Sri Lanka
Sudan
Sweden
Switzerland
Syrian Arab Republic
Thailand
Togo
Trinidad and Tobago
Tunisia
Türkiye
Uganda
Ukraine
Union of Soviet Socialist Republics
United Arab Emirates
United Arab Republic
United Kingdom of Great Britain and Northern Ireland
United Republic of Tanzania
United States of America
Uruguay
Venezuela (Bolivarian Republic of)
Viet Nam
Yemen
Yugoslavia
Zaire
Zambia
Zimbabwe
"""
elected_countries = text_1.strip().split('\n')
print(elected_countries)
['Albania', 'Algeria ', 'Angola ', 'Argentina ', 'Australia ', 'Austria ', 'Azerbaijan ', 'Bahrain ', 'Bangladesh ', 'Belarus ', 'Belgium ', 'Benin ', 'Bolivia ', 'Bosnia and Herzegovina ', 'Botswana ', 'Brazil ', 'Bulgaria ', 'Burkina Faso ', 'Burundi ', 'Cabo Verde ', 'Cameroon ', 'Canada ', 'Ceylon', 'Chad ', 'Chile ', 'China', 'Colombia ', 'Congo ', 'Costa Rica ', "Côte d'Ivoire ", 'Croatia ', 'Cuba ', 'Czech Republic ', 'Czechoslovakia ', 'Democratic Republic of the Congo ', 'Denmark ', 'Djibouti ', 'Dominican Republic ', 'Ecuador ', 'Egypt ', 'Equatorial Guinea ', 'Estonia', 'Ethiopia ', 'Finland ', 'France', 'Gabon ', 'Gambia ', 'German Democratic Republic', 'Germany ', 'Ghana ', 'Greece ', 'Guatemala ', 'Guinea ', 'Guinea-Bissau ', 'Guyana ', 'Honduras ', 'Hungary ', 'India ', 'Indonesia ', 'Iran (Islamic Republic of) ', 'Iraq ', 'Ireland ', 'Italy ', 'Ivory Coast ', 'Jamaica ', 'Japan ', 'Jordan ', 'Kazakhstan ', 'Kenya ', 'Kuwait ', 'Lebanon ', 'Liberia ', 'Libyan Arab Jamahiriya ', 'Lithuania ', 'Luxembourg ', 'Madagascar ', 'Malaysia ', 'Mali ', 'Malta ', 'Mauritania ', 'Mauritius ', 'Mexico ', 'Morocco ', 'Mozambique', 'Namibia ', 'Nepal ', 'the Netherlands ', 'New Zealand ', 'Nicaragua ', 'Niger ', 'Nigeria ', 'Norway ', 'Oman ', 'Pakistan ', 'Panama ', 'Paraguay ', 'Peru ', 'Philippines ', 'Poland ', 'Portugal ', 'Qatar ', 'Republic of Korea ', 'Romania ', 'Russian Federation', 'Rwanda ', 'Saint Vincent and the Grenadines', 'Saudi Arabia', 'Senegal ', 'Sierra Leone ', 'Singapore ', 'Slovakia ', 'Slovenia ', 'Somalia ', 'South Africa ', 'Spain ', 'Sri Lanka ', 'Sudan ', 'Sweden ', 'Switzerland', 'Syrian Arab Republic ', 'Thailand ', 'Togo ', 'Trinidad and Tobago ', 'Tunisia ', 'Türkiye', 'Uganda ', 'Ukraine ', 'Union of Soviet Socialist Republics ', 'United Arab Emirates ', 'United Arab Republic ', 'United Kingdom of Great Britain and Northern Ireland', 'United Republic of Tanzania ', 'United States of America', 'Uruguay ', 'Venezuela (Bolivarian Republic of) ', 'Viet Nam ', 'Yemen ', 'Yugoslavia ', 'Zaire', 'Zambia ', 'Zimbabwe']
# Using Selenium instead of using Requests due to UN Security Page
driver = webdriver.Chrome()
driver.get(non_elected_url)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
li_elements = soup.find_all('li')
for li_element in li_elements:
print(li_element.get_text())
عربي 中文 English Français Русский Español About the Council »Functions and PowersPractices, Procedures and Working Methods »Repertoire of Security Council PracticeProvisional Rules of Procedure (S/96/Rev.7)Interactive Handbook of the Working MethodsVoting SystemGlossarySecurity Council Affairs Division »Secretariat BranchSubsidiary Organs BranchSecurity Council Practices and Charter Research BranchFrequently Asked Questions Functions and Powers Practices, Procedures and Working Methods »Repertoire of Security Council PracticeProvisional Rules of Procedure (S/96/Rev.7)Interactive Handbook of the Working MethodsVoting SystemGlossary Repertoire of Security Council Practice Provisional Rules of Procedure (S/96/Rev.7) Interactive Handbook of the Working Methods Voting System Glossary Security Council Affairs Division »Secretariat BranchSubsidiary Organs BranchSecurity Council Practices and Charter Research Branch Secretariat Branch Subsidiary Organs Branch Security Council Practices and Charter Research Branch Frequently Asked Questions Sanctions »Al-Shabaab Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsSelected DocumentsAnnual ReportsHumanitarian ReportsImplementation Assistance NoticesISIL (Da'esh) & Al-Qaida Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesMonitoring TeamOmbudsperson ISIL (Da'esh) & Al-Qaida Sanctions CommitteeSelected DocumentsMember State ReportsAnnual Reports1518 Sanctions Committee (Iraq) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesSelected DocumentsAnnual ReportsThe Democratic Republic of Congo Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesGroup of ExpertsDue Diligence GuidelinesAnnual ReportsImplementation ReportsThe Sudan Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress releasesPanel of ExpertsImplementation ReportsAnnual ReportsImplementation Assistance NoticesSelected Documents1636 Sanctions Committee »ResolutionsCommittee GuidelinesPress ReleasesInternational Independent Investigation CommissionSelected Documents1718 Sanctions Committee (DPRK) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsSelected DocumentsImplementation ReportsAnnual ReportsProhibited ItemsImplementation Assistance NoticesProcurement of DPRK coal by Member StatesSupply, sale or transfer of all refined petroleum products to the DPRKLibya Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsCommittee ReportsImplementation ReportsSelected documentsAnnual ReportsImplementation Assistance Notices1988 Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesMonitoring TeamAnnual ReportsNational Contact PointGuinea-Bissau Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesImplementation ReportsSelected DocumentsAnnual ReportsThe Central African Republic Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsSelected DocumentsImplementation ReportsAnnual ReportsImplementation Assistance Notices2140 Sanctions Committee (Yemen) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsCommittee ReportsImplementation ReportsAnnual ReportsSouth Sudan Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsAnnual ReportsImplementation ReportsSelected documents2653 Sanctions Committee (Haiti) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsAnnual reportsSelected documentsNarrative Summaries of Reasons for ListingOmbudsperson to the ISIL (Da'esh) and Al-Qaida Sanctions CommitteeFocal Point for De-listing »ResolutionsProceduresStatisticsInformal annual reportsPool of ExpertsTerminated Sanctions Regimes Al-Shabaab Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsSelected DocumentsAnnual ReportsHumanitarian ReportsImplementation Assistance Notices Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Panel of Experts Selected Documents Annual Reports Humanitarian Reports Implementation Assistance Notices ISIL (Da'esh) & Al-Qaida Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesMonitoring TeamOmbudsperson ISIL (Da'esh) & Al-Qaida Sanctions CommitteeSelected DocumentsMember State ReportsAnnual Reports Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Monitoring Team Ombudsperson ISIL (Da'esh) & Al-Qaida Sanctions Committee Selected Documents Member State Reports Annual Reports 1518 Sanctions Committee (Iraq) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesSelected DocumentsAnnual Reports Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Selected Documents Annual Reports The Democratic Republic of Congo Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesGroup of ExpertsDue Diligence GuidelinesAnnual ReportsImplementation Reports Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Group of Experts Due Diligence Guidelines Annual Reports Implementation Reports The Sudan Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress releasesPanel of ExpertsImplementation ReportsAnnual ReportsImplementation Assistance NoticesSelected Documents Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press releases Panel of Experts Implementation Reports Annual Reports Implementation Assistance Notices Selected Documents 1636 Sanctions Committee »ResolutionsCommittee GuidelinesPress ReleasesInternational Independent Investigation CommissionSelected Documents Resolutions Committee Guidelines Press Releases International Independent Investigation Commission Selected Documents 1718 Sanctions Committee (DPRK) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsSelected DocumentsImplementation ReportsAnnual ReportsProhibited ItemsImplementation Assistance NoticesProcurement of DPRK coal by Member StatesSupply, sale or transfer of all refined petroleum products to the DPRK Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Panel of Experts Selected Documents Implementation Reports Annual Reports Prohibited Items Implementation Assistance Notices Procurement of DPRK coal by Member States Supply, sale or transfer of all refined petroleum products to the DPRK Libya Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsCommittee ReportsImplementation ReportsSelected documentsAnnual ReportsImplementation Assistance Notices Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Panel of Experts Committee Reports Implementation Reports Selected documents Annual Reports Implementation Assistance Notices 1988 Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesMonitoring TeamAnnual ReportsNational Contact Point Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Monitoring Team Annual Reports National Contact Point Guinea-Bissau Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesImplementation ReportsSelected DocumentsAnnual Reports Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Implementation Reports Selected Documents Annual Reports The Central African Republic Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsSelected DocumentsImplementation ReportsAnnual ReportsImplementation Assistance Notices Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Panel of Experts Selected Documents Implementation Reports Annual Reports Implementation Assistance Notices 2140 Sanctions Committee (Yemen) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsCommittee ReportsImplementation ReportsAnnual Reports Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Panel of Experts Committee Reports Implementation Reports Annual Reports South Sudan Sanctions Committee »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsAnnual ReportsImplementation ReportsSelected documents Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Panel of Experts Annual Reports Implementation Reports Selected documents 2653 Sanctions Committee (Haiti) »Sanctions List MaterialsResolutionsCommittee GuidelinesExemptions to the MeasuresPress ReleasesPanel of ExpertsAnnual reportsSelected documents Sanctions List Materials Resolutions Committee Guidelines Exemptions to the Measures Press Releases Panel of Experts Annual reports Selected documents Narrative Summaries of Reasons for Listing Ombudsperson to the ISIL (Da'esh) and Al-Qaida Sanctions Committee Focal Point for De-listing »ResolutionsProceduresStatisticsInformal annual reports Resolutions Procedures Statistics Informal annual reports Pool of Experts Terminated Sanctions Regimes Members »Security Council PresidencyCurrent MembersCountries Elected MembersCountries Never Elected Members Security Council Presidency Current Members Countries Elected Members Countries Never Elected Members Meetings »Programme of WorkPrevious Monthly ProgrammesPrevious Monthly ForecastsMonthly AssessmentsMonthly HighlightsReports of "Hitting the Ground Running" workshopMeeting Records and OutcomesVTCs, meetings and outcomes during COVID-19 Programme of Work Previous Monthly Programmes Previous Monthly Forecasts Monthly Assessments Monthly Highlights Reports of "Hitting the Ground Running" workshop Meeting Records and Outcomes VTCs, meetings and outcomes during COVID-19 Documents »ResolutionsPresidential StatementsExchange of LettersNotes by the PresidentReports of the Secretary-GeneralReports of the Security Council Missions Annual ReportsVolumes of ResolutionsPress Statements Resolutions Presidential Statements Exchange of Letters Notes by the President Reports of the Secretary-General Reports of the Security Council Missions Annual Reports Volumes of Resolutions Press Statements News »Press ReleasesPress ConferencesPhotosWebcastsAnnual Round Ups Press Releases Press Conferences Photos Webcasts Annual Round Ups Interactive Handbook Home Members Countries Never Elected Members of the Security Council Afghanistan Andorra Antigua and Barbuda Armenia Bahamas Barbados Belize Bhutan Brunei Darussalam Cambodia Central African Republic Comoros Cyprus Democratic People's Republic of Korea Dominica El Salvador Eritrea Fiji Georgia Grenada Haiti Iceland Israel Kiribati Kyrgyzstan Lao People's Democratic Republic Latvia Lesotho Liechtenstein Malawi Maldives Marshall Islands Micronesia (Federated States of) Monaco Mongolia Montenegro Myanmar Nauru North Macedonia Palau Papua New Guinea Republic of Moldova Saint Kitts and Nevis Saint Lucia Samoa San Marino Sao Tome and Principe Serbia Seychelles Solomon Islands South Sudan Suriname Swaziland Tajikistan Timor-Leste Tonga Turkmenistan Tuvalu Uzbekistan Vanuatu Meeting records and outcomes VTCs and outcomes (COVID-19) Voting records Security Council resolutions Index to Proceedings + See all documents Facebook Twitter YouTube Flickr UN Social Media A-Z Site Index Contact Copyright FAQ Fraud Alert Privacy Notice Terms of Use
text_2 = """
Afghanistan
Andorra
Antigua and Barbuda
Armenia
Bahamas
Barbados
Belize
Bhutan
Brunei Darussalam
Cambodia
Central African Republic
Comoros
Cyprus
Democratic People's Republic of Korea
Dominica
El Salvador
Eritrea
Fiji
Georgia
Grenada
Haiti
Iceland
Israel
Kiribati
Kyrgyzstan
Lao People's Democratic Republic
Latvia
Lesotho
Liechtenstein
Malawi
Maldives
Marshall Islands
Micronesia (Federated States of)
Monaco
Mongolia
Montenegro
Myanmar
Nauru
North Macedonia
Palau
Papua New Guinea
Republic of Moldova
Saint Kitts and Nevis
Saint Lucia
Samoa
San Marino
Sao Tome and Principe
Serbia
Seychelles
Solomon Islands
South Sudan
Suriname
Swaziland
Tajikistan
Timor-Leste
Tonga
Turkmenistan
Tuvalu
Uzbekistan
Vanuatu
"""
non_elected_countries = text_2.strip().split('\n')
print(non_elected_countries)
['Afghanistan', 'Andorra', 'Antigua and Barbuda', 'Armenia', 'Bahamas', 'Barbados', 'Belize', 'Bhutan', 'Brunei Darussalam', 'Cambodia', 'Central African Republic', 'Comoros', 'Cyprus', "Democratic People's Republic of Korea", 'Dominica', 'El Salvador', 'Eritrea', 'Fiji', 'Georgia', 'Grenada', 'Haiti', 'Iceland', 'Israel', 'Kiribati', 'Kyrgyzstan', "Lao People's Democratic Republic", 'Latvia', 'Lesotho', 'Liechtenstein', 'Malawi', 'Maldives', 'Marshall Islands', 'Micronesia (Federated States of)', 'Monaco', 'Mongolia', 'Montenegro', 'Myanmar', 'Nauru', 'North Macedonia', 'Palau', 'Papua New Guinea', 'Republic of Moldova', 'Saint Kitts and Nevis', 'Saint Lucia', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Serbia', 'Seychelles', 'Solomon Islands', 'South Sudan', 'Suriname', 'Swaziland', 'Tajikistan', 'Timor-Leste', 'Tonga', 'Turkmenistan', 'Tuvalu', 'Uzbekistan', 'Vanuatu']
import plotly.express as px
import plotly.graph_objs as go
# Create a DataFrame with all countries and their election status
countries_df = pd.DataFrame({
'Country': elected_countries + non_elected_countries,
'Elected': ['Yes'] * len(elected_countries) + ['No'] * len(non_elected_countries)
})
fig = go.Figure(data=go.Choropleth(
locations=countries_df['Country'],
locationmode='country names',
z=countries_df['Elected'].astype('category').cat.codes,
text=countries_df['Country'],
colorscale=['red', 'lightgreen'],
colorbar_title='Elected to UN',
marker_line_color='darkgray',
marker_line_width=0.5,
))
fig.update_layout(
title_text='UN Election Status of Countries since 1946',
geo=dict(
showframe=False,
showcoastlines=False,
projection_type='equirectangular'
),
)
fig.show()
Discussion place-holder for Problem 3.
Question: Consider (and explain in the accompanying Discussion place-holder) why you have chosen a given visualization technique, and how you have ensured it has low lie-factor, high data-ink ratio while also meeting accessibility and aesthetics requirements.
Answer:
Problem 4: (3+5=8 points: This is a multi-part question)
Consider the data provided in Salary_Data_gender_and_race.csv
Problem 4.1: For the employees in the USA, use a suitable visualization tool to compare the salary of people whose race is identified as "White" versus those who are not "White".
Problem 4.2: Using sutable test, determine whether there is any statistically significant (with at least 95% confidence) difference in salary across Male and Female employees identified as non-White in the USA, for the provided dataset.
In the explanation part, clearly state your Null and alternate hypothesis, explain which test you chose to use, why it is suitable, and elaborate the interpretation of the test result.
import matplotlib.pyplot as plt
import seaborn as sns
# Your code for Problem 4.1
salary_df = pd.read_csv('Salary_Data_gender_and_race.csv')
salary_df.head()
| Unnamed: 0 | Age | Gender | Education Level | Job Title | Years of Experience | Salary | Country | Race | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 32 | Male | Bachelor's | Software Engineer | 5.0 | $90,000.00 | UK | White |
| 1 | 1 | 28 | Female | Master's | Data Analyst | 3.0 | $65,000.00 | USA | Hispanic |
| 2 | 2 | 45 | Male | PhD | Senior Manager | 15.0 | $150,000.00 | Canada | White |
| 3 | 3 | 36 | Female | Bachelor's | Sales Associate | 7.0 | $60,000.00 | USA | Hispanic |
| 4 | 4 | 52 | Male | Master's | Director | 20.0 | $200,000.00 | USA | Asian |
salary_df['Race'].unique()
array(['White', 'Hispanic', 'Asian', 'Korean', 'Chinese', 'Australian',
'Welsh', 'African American', 'Mixed', 'Black'], dtype=object)
# Replace categories that are not 'White' with 'Not White'
salary_df['Race Category'] = salary_df['Race'].apply(lambda x: 'Not White' if x != 'White' else x)
salary_df
| Unnamed: 0 | Age | Gender | Education Level | Job Title | Years of Experience | Salary | Country | Race | Race Category | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 32 | Male | Bachelor's | Software Engineer | 5.0 | $90,000.00 | UK | White | White |
| 1 | 1 | 28 | Female | Master's | Data Analyst | 3.0 | $65,000.00 | USA | Hispanic | Not White |
| 2 | 2 | 45 | Male | PhD | Senior Manager | 15.0 | $150,000.00 | Canada | White | White |
| 3 | 3 | 36 | Female | Bachelor's | Sales Associate | 7.0 | $60,000.00 | USA | Hispanic | Not White |
| 4 | 4 | 52 | Male | Master's | Director | 20.0 | $200,000.00 | USA | Asian | Not White |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6695 | 6699 | 49 | Female | PhD | Director of Marketing | 20.0 | $200,000.00 | UK | Mixed | Not White |
| 6696 | 6700 | 32 | Male | High School | Sales Associate | 3.0 | $50,000.00 | Australia | Australian | Not White |
| 6697 | 6701 | 30 | Female | Bachelor's Degree | Financial Manager | 4.0 | $55,000.00 | China | Chinese | Not White |
| 6698 | 6702 | 46 | Male | Master's Degree | Marketing Manager | 14.0 | $140,000.00 | China | Korean | Not White |
| 6699 | 6703 | 26 | Female | High School | Sales Executive | 1.0 | $35,000.00 | Canada | Black | Not White |
6700 rows × 10 columns
category_counts = salary_df['Race Category'].value_counts()
plt.figure(figsize = (8,6))
colors = ['lightgreen' if category == 'White' else 'salmon' for category in category_counts.index]
plt.bar(category_counts.index, category_counts.values, color=colors)
plt.xlabel('Category')
plt.ylabel('Count')
plt.title('Count Plot of White vs Not White')
plt.show()
# Your code for Problem 4.2
from scipy import stats
# First, check whether distributions of Non White Females and Males are normally distributed
filtered_data = salary_df.copy()
filtered_data = filtered_data[(filtered_data['Country'] == 'USA') & (filtered_data['Race Category'] == 'Not White')]
# Remove non-numeric characters like $ and commas
filtered_data['Salary'] = filtered_data['Salary'].replace('[\$,]', '', regex=True).astype(float)
male_salaries = filtered_data[filtered_data['Gender'] == 'Male']['Salary']
female_salaries = filtered_data[filtered_data['Gender'] == 'Female']['Salary']
# Shapiro-Wilk Test
w_statistic_male, p_value_male = stats.shapiro(male_salaries)
w_statistic_female, p_value_female = stats.shapiro(female_salaries)
print(f'Male Salaries - W-statistic: {w_statistic_male}, P-value: {p_value_male}')
print(f'Female Salaries - W-statistic: {w_statistic_female}, P-value: {p_value_female}')
Male Salaries - W-statistic: 0.9530461430549622, P-value: 3.0192868604589362e-12 Female Salaries - W-statistic: 0.964576244354248, P-value: 4.774281414654524e-09
Both P-values <<< 0.05, hence both distributions are not normal, hence Two-tailed T-test cannot be done.
Hence, a Mann-Whitney U test will be conducted on the two distributions and the reasons being
from scipy.stats import mannwhitneyu
# Perform the Mann-Whitney U test
u_statistic, p_value = mannwhitneyu(male_salaries, female_salaries)
print(f'U-statistic: {u_statistic}, P-value: {p_value}')
U-statistic: 145635.5, P-value: 2.2931749325015127e-05
Discussion place-holder for Problem 4.2
Problem 5: (5 points)
Consider the below interaction graph based on games played by various footbal teams. Use a suitable community detection (clustering) algorithm to determine the number of meaningful communities, and membership of the football teams in those communities. Provide the final answer in the form of a list of lists, where each sublist indicates the membership of the individual community (by including the name of the teams). Provide a very brief explanation for the rational of your choice of the community detection algorithm.
import urllib.request
import io
import zipfile
import matplotlib.pyplot as plt
import networkx as nx
url = "http://www-personal.umich.edu/~mejn/netdata/football.zip"
sock = urllib.request.urlopen(url) # open URL
s = io.BytesIO(sock.read()) # read into BytesIO "file"
sock.close()
zf = zipfile.ZipFile(s) # zipfile object
txt = zf.read("football.txt").decode() # read info file
gml = zf.read("football.gml").decode() # read gml data
gml = gml.split("\n")[1:]
G = nx.parse_gml(gml) # parse gml data
# print name of all the teams
for n in G.nodes():
print(f"{n:20}")
BrighamYoung FloridaState Iowa KansasState NewMexico TexasTech PennState SouthernCalifornia ArizonaState SanDiegoState Baylor NorthTexas NorthernIllinois Northwestern WesternMichigan Wisconsin Wyoming Auburn Akron VirginiaTech Alabama UCLA Arizona Utah ArkansasState NorthCarolinaState BallState Florida BoiseState BostonCollege WestVirginia BowlingGreenState Michigan Virginia Buffalo Syracuse CentralFlorida GeorgiaTech CentralMichigan Purdue Colorado ColoradoState Connecticut EasternMichigan EastCarolina Duke FresnoState OhioState Houston Rice Idaho Washington Kansas SouthernMethodist Kent Pittsburgh Kentucky Louisville LouisianaTech LouisianaMonroe Minnesota MiamiOhio Vanderbilt MiddleTennesseeState Illinois MississippiState Memphis Nevada Oregon NewMexicoState SouthCarolina Ohio IowaState SanJoseState Nebraska SouthernMississippi Tennessee Stanford WashingtonState Temple Navy TexasA&M NotreDame TexasElPaso Oklahoma Toledo Tulane Mississippi Tulsa NorthCarolina UtahState Army Cincinnati AirForce Rutgers Georgia LouisianaState LouisianaLafayette Texas Marshall MichiganState MiamiFlorida Missouri Clemson NevadaLasVegas WakeForest Indiana OklahomaState OregonState Maryland TexasChristian California AlabamaBirmingham Arkansas Hawaii
communities = list(nx.algorithms.community.greedy_modularity_communities(G))
#!pip install python-louvain
colors = []
for node in G:
for i in range(len(communities)):
if node in communities[i]:
colors.append(i)
fig = plt.figure(figsize=(20,10))
pos = nx.spring_layout(G)
nx.draw(G,pos, with_labels = 'true',cmap=plt.cm.Paired, node_color=colors)
plt.show()
# Convert each frozenset to a list and store in a new list
communities_list = [list(community) for community in communities]
# Print each community
for i, community in enumerate(communities_list):
print(f"Community {i + 1}:")
print(", ".join(community))
print("=========================================================================================================")
Community 1: LouisianaMonroe, MississippiState, Tulane, Army, Auburn, SouthernMississippi, Tennessee, Florida, LouisianaLafayette, MiddleTennesseeState, Alabama, LouisianaState, Mississippi, CentralFlorida, Louisville, Houston, AlabamaBirmingham, Memphis, Arkansas, SouthCarolina, Georgia, Connecticut, Kentucky, Cincinnati, LouisianaTech, EastCarolina, Vanderbilt ========================================================================================================= Community 2: Stanford, Arizona, NevadaLasVegas, Oregon, ColoradoState, Rice, Nevada, UCLA, California, ArizonaState, SouthernMethodist, WashingtonState, OregonState, Utah, SanJoseState, Tulsa, Wyoming, Washington, SanDiegoState, Hawaii, BrighamYoung, FresnoState, TexasChristian, SouthernCalifornia, AirForce ========================================================================================================= Community 3: NewMexicoState, IowaState, Iowa, Oklahoma, NewMexico, TexasA&M, Kansas, BoiseState, NorthTexas, ArkansasState, Missouri, Nebraska, Texas, KansasState, OklahomaState, Idaho, Colorado, TexasTech, TexasElPaso, Baylor, UtahState ========================================================================================================= Community 4: Pittsburgh, NorthCarolina, Navy, VirginiaTech, BostonCollege, FloridaState, NorthCarolinaState, WestVirginia, Syracuse, Rutgers, WakeForest, Maryland, NotreDame, Temple, Clemson, Virginia, GeorgiaTech, MiamiFlorida, Duke ========================================================================================================= Community 5: BowlingGreenState, Akron, Ohio, MiamiOhio, NorthernIllinois, BallState, WesternMichigan, Kent, Buffalo, Marshall, CentralMichigan, EasternMichigan, Toledo ========================================================================================================= Community 6: OhioState, Michigan, Indiana, Illinois, MichiganState, Purdue, PennState, Wisconsin, Northwestern, Minnesota =========================================================================================================
Discussion place-holder for Problem 5 Answer:
Provide details of use of AI tools such as ChatGPT.
The following information are the resources i have acquired from ChatGPT.
I have refered to ChatGPT for Problems 2 and 3 in the formation of my own code.
Problem 2 Reference to calculation of probability.